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The complexity of human interactions with social and natural phenomena is mirrored in the way 
we describe our experiences through natural language. In order to retain and convey such a high 
dimensional information, the statistical properties of our linguistic output has to be highly correlated 
in time. An example are the robust observations, still largely not understood, of correlations on 
arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from 
highly structured linguistic levels down to the building blocks of a text (words, letters, etc.). By 
combining calculations and data analysis we show that correlations take form of a bursty sequence 
of events once we approach the semantically relevant topics of the text. The mechanisms we identify 
are fairly general and can be equally applied to other hierarchical settings. 
Published as: |Proc. Nat. Acad. Sci. USA (2012) doi: 10.1073/pnas.lll7723109| 



Literary texts are an expression of the natural language 
ability to project complex and high-dimensional phenom- 
ena into a one-dimensional, semantically meaningful se- 
quence of symbols. For this projection to be successful, 
such sequences have to encode the information in form 
of structured patterns, such as correlations on arbitrarily 
long scales [UH]. Understanding how language processes 
long-range correlations, an ubiquitous signature of com- 
plexity present in human activities [3HZ] and in the nat- 
ural world [81-111] . is an important task towards compre- 
hending how natural language works and evolves. This 
understanding is also crucial to improve the increasingly 
important applications of information theory and statis- 
tical natural language processing, which are mostly based 
on short-range-correlations methods [I21U5| . 

Take your favorite novel and consider the binary se- 
quence obtained by mapping each vowel into a 1 and all 
other symbols into a 0. One can easily detect structures 
on neighboring bits, and we certainly expect some repe- 
tition patterns on the size of words. But one should cer- 
tainly be surprised and intrigued when discovering that 
there are structures (or memory) after several pages or 
even on arbitrary large scales of this binary sequence. 
In the last twenty years, similar observations of long- 
range correlations in texts have been related to large 
scales characteristics of the novels such as the story be- 
ing told, the style of the book, the author, and the lan- 
guage [UHIHllElttSlIIHlUnilSS- However ' the mechanisms 
explaining these connections are still missing (see Ref. [5] 
for a recent proposal). Without such mechanisms, many 
fundamental questions cannot be answered. For instance, 
why all previous investigations observed long-range corre- 
lations despite their radically different approaches? How 
and which correlations can flow from the high-level se- 
mantic structures down to the crude symbolic sequence 
in the presence of so many arbitrary influences? What 
information is gained on the large structures by looking 
at smaller ones? Finally, what is the origin of the long- 
range correlations? 

In this paper we provide answers to these questions 
by approaching the problem through a novel theoretical 
framework. This framework uses the hierarchical organi- 



zation of natural language to identify a mechanism that 
links the correlations at different linguistic levels. As 
schematically depicted in Fig. [T] a topic is linked to sev- 
eral words that are used to describe it in the novel. At 
the lower level, words are connected to the letters they 
are formed, and so on. We calculate how correlations are 
transported through these different levels and compare 
the results with a detailed statistical analysis in ten dif- 
ferent novels. Our results reveal that while approaching 
semantically relevant high-level structures, correlations 
unfold in form of a bursty signal. Moving down in lev- 
els, we show that correlations (but not burstiness) are 
preserved, explaining the ubiquitous appearance of long- 
range correlations in texts. 
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FIG. 1: Hierarchy of levels at which literary texts can be 
analyzed. Depicted are the levels vowels/consonants (V/C), 
letters (a-z), words, and topics. 
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I. THEORETICAL FRAMEWORK 

A. The importance of the observable 

In line with information theory, we treat a literary text 
as the output of a stationary and ergodic source that 
takes values in a finite alphabet and we look for infor- 
mation about the source through a statistical analysis 
of the text [23]. Here we focus on correlations func- 
tions, which are defined after specifying an observable 
and a product over functions. In particular, given a sym- 
bolic sequence s (the text), we denote by Sk the symbol 
in the fc-th position and by s™ (to > n) the substring 
(s„, s n +i, . . . , s m ). As observables, we consider functions 
/ that map symbolic sequences s into a sequence x of 
numbers (e.g., O's and l's). We restrict to local map- 
pings, namely Xk = /( s fc + ' ) f° r & nv k and a finite con- 
stant r > 0. Its autocorrelation function is defined as: 

Cf(t) := (/(^ +r )/(4' +r )> - (/(*l +P )X/(*$ +r )>, (!) 

where t plays the role of time (counted in number of 
symbols) and (•) denotes an average over sliding windows, 
see Supporting Information (SI) Sec. I for details. 

The choice of the observable / is crucial in determin- 
ing whether and which "memory" of the source is being 
quantified. Only once a class of observables sharing the 
same properties is shown to have the same asymptotic 
autocorrelation, it is possible to think about long-range 
correlations of the text as a whole. In the past, differ- 
ent kinds of observables and encodings (which also cor- 
respond to particular choices of /) were used, from the 
Huffmann code [25], to attributing to each symbol an 
arbitrary binary sequence (ASCII, Unicode, 6-bit tables, 
dividing letters in groups, etc.) [H \W j |2"01 Effi [27 ] . to the 
use of the frequency-rank [7] or parts of speech [19] on the 
level of words. While the observation of long-range corre- 
lations in all cases points towards a fundamental source, 
it remains unclear which common properties these ob- 
servables share. This is essential to determine whether 
they share a common root (conjectured in Ref. pQ) and 
to understand the meaning of quantitative changes in the 
correlations for different encodings (reported in Ref. 16 ]). 
In order to clarify these points we use mappings / that 
avoid the introduction of spurious correlations. Inspired 
by Voss [IT] and Ebeling et al. 0[5] 00] we use / Q 's that 
transform the text into binary sequences x by assigning 
Xk = 1 if and only if a local matching condition a is 
satisfied at the fc-th symbol, and Xk — otherwise (e.g., 
a = k-th symbol is a vowel). See Si-Sec. II for specific 
examples. 



B. Correlations and burstiness 

Once equipped with the binary sequence x associ- 
ated with the chosen condition a we can investigate the 
asymptotic trend of its C x (t). We are particularly inter- 



ested in the long-range correlated case 

C x (t) := (xjXj+t) - (x^ixj+t) ~ t- p 7 < < 1, 

(2) 

for which J^uLo diverges. In this case the asso- 
ciated random walker X(t) := y] -_ Xj spreads super- 
diffusively as [HI 125] 

a\{t):={X{tf)-{X{t))^t\ 7 = 2-/?. (3) 

In the following we investigate correlations of the bi- 
nary sequence x using Eq. (|3| because integrated indica- 
tors lead to more robust numerical estimations of asymp- 
totic quantities [TJ HI [TO] E] . We are mostly interested in 
the distinction between short- (j3 > 1, 7 = 1) and long- 
(0</3<l,l<7<2) range correlations. We use 
normal (anomalous) diffusion of X interchangeably with 
short- (long-) range correlations of x. 

An insightful view on the possible origins of the long- 
range correlations can be achieved by exploring the rela- 
tion between the power spectrum S(uj) at ui = and the 
statistics of the sequence of inter-event times Tj's (i.e., 
one plus the lengths of the cluster of O's between consec- 
utive l's). For the short-range correlated case, 5(0) is 
finite and given by [301 E3] : 

S(0) = ^(l + 2^C T {k)Y (4) 

For the long-range correlated case, 5(0) -> 00 and Eq. @ 
identifies two different origins: (i) burstiness measured as 
the broad tail of the distribution of inter-event times p(r) 
(divergent <r T ); or (ii) long-range correlations of the se- 
quence of Tj's (not summable C r (fc)). In the next section 
we show how these two terms give different contributions 
at different linguistic levels of the hierarchy. 

C. Hierarchy of levels 

Building blocks of the hierarchy depicted in Fig. [T] 
are binary sequences (organized in levels) and links be- 
tween them. Levels are established from sets of seman- 
tically or syntactically similar conditions a's (e.g., vow- 
els/consonants, different letters, different words, different 
topics) [H] . Each binary sequence x is obtained by map- 
ping the text using a given f a , and will be denoted by 
the relevant condition in a. For instance, prince de- 
notes the sequence x obtained from the matching con- 
dition a : s^ +7 = " prince " . A sequence z is linked to 
x if for all j's such that Xj — 1 we have Zj +r > — 1, for 
a fixed constant r' . If this condition is fulfilled we say 
that x is on top of z and that x belongs to a higher level 
than z. By definition, there are no direct links between 
sequences at the same level. A sequence at a given level 
is on top of all the sequences in lower levels to which 
there is a direct path. For instance, prince is on top of e 
which is on top of vowel. As will be clear later from our 
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results, the definition of link can be extended to have a 
probabilistic meaning, suited for generalizations to high 
levels (e.g., " prince " is more probable to appear while 
writing about a topic connected to war). 



D. Moving in the hierarchy 

We now show how correlations flow through two linked 
binary sequences. Without loss of generality we denote x 
a sequence on top of z and y the unique sequence on top 
of z such that z = x + y (sum and other operations are 
performed on each symbol: Zi = x% + yi for all i). The 
spreading of the walker Z associated with z is given by 



'x 



(*) 



4(t) 



2C(X(t),Y(t)), 



(5) 



where C(A, B) = (AB) — (A)(B) is the cross-correlation. 
Using the Cauchy-Schwarz inequality \C(X(t),Y(t))\ < 
&x(t)(TY(t) we obtain 



a z (t) <ax(t) + <r Y (t). 



(6) 



Define x, as the sequence obtained reverting f> 1 on 
each of its elements Xi = 1 — xi. It is easy to see 
that if z = x + y then x = z + y. Applying the 
same arguments above, and using that ax = o~ x f° r 
any x, we obtain o~x(t) < o~z(t) + ov(i) and similarly 
cy (t) < °~z (t) + o~x if) ■ Suppose now that of ~ V i with 
i € {X,Y,Z}. In order to satisfy simultaneously the 
three inequalities above, at least two out of the three 7, 
have to be equal to the largest value maxi{7i}. Next we 
discuss the implications of this restriction to the flow of 
correlations up and down in our hierarchy of levels. 

Up. Suppose that at a given level we have a binary 
sequence z with long-range correlations 72 > 1. From 
our restriction we know that at least one sequence x on 
top of z, has long-range correlations with -fx > 7z- This 
implies, in particular, that if we observe long-range cor- 
relations in the binary sequence associated with a given 
letter then we can argue that its anomaly originates from 
the anomaly of at least one word where this letter ap- 
pears, higher in the hierarchy [42 . 

Down. Suppose x is long-range correlated 7x > 1. 
From Eq. ( 10 1 we see that a fine tuning cancellation with 



cross-correlation must appear in order for their lower- 
level sequence z (down in the hierarchy) to have 72 < jx ■ 
From the restriction derived above we know that this is 
possible only if -fx — 1y , which is unlikely in the typical 
case of sequences z receiving contributions from different 
sources (e.g., a letter receives contribution from different 
words). Typically, z is composed by n sequences x^', 
with 7 X( i) 7^ 7 X(2 ) 7^ ... 7^ 7x(«)) m which case -f Z = 
riieoij{"fx(3) }■ Correlations typically flow down in our 
hierarchy of levels. 

Finite-time effects. While the results above are valid 
asymptotically (infinitely long sequences), in the case of 
any real text we can only have a finite-time estimate 7 of 
the correlations 7. Already from Eq. ( 10 ) we see that the 



addition of sequences with different 'Tj^u) , the mechanism 
for moving down in the hierarchy, leads to -fz < lz if 7z 
is computed at a time when the asymptotic regime is 
still not dominating. This will play a crucial role in our 
understanding of long-range correlations in real books. 
In order to give quantitative estimates, we consider the 
case of z being the sum of the most long-range correlated 
sequence x (the one with -fx = max j{7x(j) }) an d many 
other independent non-overlapping[43 sequences whose 
combined contribution is written as y = £(1 — x), with 
£j an independent identically distributed binary random 
variable. This corresponds to the random addition of l's 
with probability (£) to the 0's of x. In this case <j\ shows 
a transition from normal jz = 1 to anomalous j z = Ix 
diffusion. The asymptotic regime of z starts after a time 



t T > 



(0 



1 



1 - (0 g{x) 



1/(7X-1) 



(7) 



where < g < 1 and -fx > 1 are obtained from a\ 
which asymptotically goes as g(x)(l — x)t lx . Note that 
the power- law sets at t = 1 only if g = 1. A similar 
relation is obtained moving up in the hierarchy, in which 
case a sequence x in a higher level is built by random 
subtracting l's from the lower-level sequence z as x = £z 
(see Si-Sec. III-A for all calculations). 

Burstiness. In contrast to correlations, burstiness 
due to the tails of the inter-event time distribution p(r) 
is not always preserved when moving up and down in the 
hierarchy of levels. Consider first going down by adding 
sequences with different tails of p(r) . The tail of the com- 
bined sequence will be constrained to the shortest tail of 
the individual sequences. In the random addition exam- 
ple, z = x + £(1 — x) with x having a broad tail in p(r), 
the large r asymptotic of z has short-tails because the 
cluster of zeros in x is cut randomly by £ |32j . Going up 
in the hierarchy, we take a sequence on top of a given 
bursty binary sequence, e.g., using the random subtrac- 
tion x = £z mentioned above. The probability of finding 
a large inter-event time r in z is enhanced by the number 
of times the random deletion merges two or more clusters 
of 0's in x, and diminished by the number of times the 
deletion destroys a previously existent inter-event time r. 
Even accounting for the change in (t), this moves cannot 
lead to a short-ranged p(r) for x if p(r) of z has a long 
tail (see Si-Sec. III-B). Altogether, we expect burstiness 
to be preserved moving up, and destroyed moving down 
in the hierarchy of levels. 

Summary. From Eq. Q the origin of long-range 
correlations 7 > 1 can be traced back to two differ- 
ent sources: the tail of p(r) (burstiness) and the tail of 
C T {k). The computations above reveal their different role 
at different levels in the hierarchy: 7 is preserved moving 
down, but there is a transfer of information from p(r) 
to C T (k). This is better understood by considering the 
following simplified set-up: suppose at a given level we 
observe a sequence x coming from a renewal process with 
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FIG. 2: Burstiness and long-range correlation on different 
linguistic levels. The binary sequences of the letter "e" (a,b) 
and of the word " prince " (c,d) in the book "War and Peace" 
are shown. (a,c) The cumulative inter-event time distribution 
P(t) = Jgp(t')dt'. (b,d) Transport a 2 x (t) defined in Eq. (B. 
The numerical results show: (a) exponential decay of PJt) 
with cr T /(r} = 0.83 Inset: p(r) in log-linear scales; (b) 7 = 
1.39 ±0.05; (c) non-exponential decay of P(r) with ov/(t) = 
3.86; and (d) 7 = 1.68 ± 0.05. All panels show results for the 
the original and Ai , ^-shuffled sequences, see legend. 

broad tails in the inter-event times 

P (t) ~ t-" and C T (fc) = S(k), (8) 

with 2 < \i < 3 leading to — 4 — A* [IS] ■ Let us now 
consider what is observed in z, at a level below, obtained 
by adding to x other independent sequences. The long r's 
(a long sequence of 0's) in Eq. (|8| will be split in two 
long sequences introducing at the same time a cut-off r c 
in p(r) and non-trivial correlations C T {k) ^ for large k. 
In this case, asymptotically the long-range correlations 
(-fz = niax{7jf,7y} > 1) is solely due to C T {k) 7^ 0. 
Burstiness affects only 7 estimated for times t < t c . A 
similar picture is expected in the generic case of a starting 
sequence x with broad tails in both p(r) and C T (k). 



II. DATA ANALYSIS OF LITERARY TEXTS 

Equipped with previous section's theoretical frame- 
work, here we interpret observations in real texts. We 
use ten English versions of international novels (see SI- 
Sec. IV for the list and for the pre-processing applied to 
the texts). For each book 41 binary sequences were ana- 
lyzed separately: vowel/consonants, 20 at the letter level 
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FIG. 3: Burstiness-correlation diagram for sequences at dif- 
ferent levels. (j T /(r) is an indicator of the burstiness of the 
distribution p(r). 7 is a finite time estimator of the global in- 
dicator of long-range correlation 7. A Poisson process has 
(<7 r /(r},7) = (1,1). The twenty most frequent symbols 
(white circles) and twenty frequent words (black circles) of 
wrnpc are shown (see SI- Tables for all books). V indicates 
the case of vowels and B of blank space. The red dashed- 
line is a lower-bound estimate of 7 due to burstiness (see 
Si-Sec. VI). This diagram is a generalization for long-range 
correlated sequences of the diagrams in Ref. [33] . 



(blank space and the 19 most frequent letters) , and 20 at 
the word level (6 most frequent words, 7 most frequent 
nouns, and 7 words with frequency matched to the fre- 
quency of the nouns). The finite-time estimator of the 
long-range correlations 7 was computed fitting Eq. ^ 
in a broad range of large t g [t s ',t s ] (time lag of cor- 
relations) up to t s — 1% of the book size. This range 
was obtained using a conservative procedure designed to 
robustly distinguish between short and long-range corre- 
lations (see Si-Sec. V). We illustrate the results in our 
longest novel, "War and Peace" by L. Tolstoy (wrnpc, in 
short, see SI- Tables for the results in all books). 



A. Data analysis of correlations and burstiness 

One of the main goals of our measurements is to dis- 
tinguish, at different hierarchy levels, between the two 
possible sources of long-range correlations in Eq. Q - 
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burstiness corresponding to p(r) with diverging a T or di- 
verging ^C r (fc). To this end we compare the results 
with two null-model binary sequences x^! , x^ 2 obtained 
by applying to x the following procedures: 

Al: shuffle the sequence of {0, l}'s. Destroys all corre- 
lations. 

A2: shuffle the sequence of inter-event times Tj's. De- 
stroys correlations due to C T (k) but preserves those 
due to p(r). 

Starting from the lowest level of the hierarchy depicted 
in Fig. [TJ we obtain 7 = 1.55 ± 0.05 for the sequence 
of vowels in wrnpc and 7 between 1.18 and 1.61 in the 
other 9 books (see Si-Fig. SI). The values for xai and 
X A2 were compatible (two error bars) with the expected 
value 7 = 1.0 in all books. Figures[2^ib show the compu- 
tations for the case of the letter "e" : while p{r) decays ex- 
ponentially in all cases (Fig. [2^), long-range correlations 
are present in the original sequence e but absent from the 
A2 shuffled version of e (Fig.^). This means that bursti- 
ness is absent from e and does not contribute to its long- 
range correlations. In contrast, for the word " prince " 
Fig. [2p shows a non-exponential p(r) and Fig. [2ji shows 
that the original sequence prince and the A2 shuffled 
sequence show similar long-range correlations (black and 
red curves, respectively). This means that the origin of 
the long-range correlations of prince are mainly due to 
burstiness - tails of p(r) - and not to correlations in the 
sequence of Tj's - C T (k). 

In Fig. [3] we plot for different sequences the summary 
quantities 7 and o~ t /{t) - a measure of the burstiness 
proportional to the relative width of p(r) [3"3l l3"4"] . A 
Poisson process has 7 = a T /(r) = 1. All letters have 
ct t /(t) w 1, but clear long-range correlations 7 > 1.1 
(left box magnified in Fig. ph. This means that corre- 
lations come from C T (k) and not from p(r), as shown 
in Fig. [2ja,b) for the letter "e" . The situation is more 
interesting in the higher-level case of words. The most 
frequent words and the words selected to match the nouns 
mostly show a T / (r) w 1 so that the same conclusions we 
drew about letters apply to these words. In contrast to 
this group of function words are the most frequent nouns 
that have large a T / (r) [TH1IH1IM1I1S] and large 7, appear- 
ing as outliers at the upper right corner of Fig. [3] The 
case of " prince " shown in Fig.[2|c,d) is representative of 
these words, for which burstiness contributes to the long- 
range correlations. In order to confirm the generality of 
Fig. [3] in the 10 books of our database, we performed a 
pairwise comparison of 7 and o~ t /{t) between the 7 nouns 
and their frequency matched words. Overall, the nouns 
had a larger 7 in 56 and a larger a T /(r) in 55 out of the 
70 cases (P- value < 10~ 6 , assuming equal probability). 
In every single book at least 4 out of 7 comparisons show 
larger values of 7 and cr T /(r) for the nouns. 

We now explain a striking feature of the data shown 
in Fig. [3} the absence of sequences with low 7 and high 
o~ t /(t) (lower-right corner). This is an evidence of cor- 
relation between these two indicators and motivates us 



to estimate a cr T /(T)-dependent lower bound for 7, as 
shown in Fig. [3] Note that high values of burstiness are 
responsible for long-range correlations estimate 7 > 1, 
as discussed after Eq. ([8|. For instance, the slow de- 
cay of p(r) for intermediate r in prince (Fig. [5J;) leads 
to oy/(t) 3> 1 and an estimate 7 > 1 at intermediate 
times. Burstiness contribution to 7 (which gets also con- 
tributions from long-range correlations in the Tj's) is mea- 
sured by 7A2, which is usually a lower bound for the total 
long-range correlations: 7 > 7^2 ■ More quantitatively, 
consider an j42-shufned sequence with power-law p(r) - 
as in Eq. ([8| - with an exponential cut-off for r > r c . 
By increasing r c we have that o~ T / '(r) monotonously in- 
creases [it can be computed directly from p(r)]. In terms 
of 7_A2j if the fitting interval t g [t s >, t s ] used to compute 
the finite time 7,42 is all below r c (i.e. t s < t c ) we have 
JA2 = 4 — /i > 1 (see Eq. ([8])) while if the fitting interval 
is all beyond the cutoff (i.e. r c < t s ' ) we have ^A2 = 1- 
Interpolating linearly between these two values and us- 
ing [i — 2.4 we obtain the lower bound for 7 in Fig. [3] 
It strongly restricts the range of possible (o- t /(t),j) in 
agreement with the observations and also with 7 obtained 
for the ^42-shuffled sequences (see Si-Sec. VI for further 
details). 



B. Data analysis of finite-time effects 

The pre-asymptotic normal diffusion - anticipated in 
Sec. Finite-time effects - is clearly seen in Fig. [4] 
Our theoretical model explains also other specific obser- 
vations: 

1. Key- words reach higher values of 7 than letters 
(7c < 7princo)- This observation contradicts our expecta- 
tion for asymptotic long times: prince is on top of e and 



the reasoning after Eq. 



implies -y c > 7 pr 



This 



seeming contradiction is solved by our estimate (17 1 of 



the transition time tx needed for the finite-time estimate 
7 to reach the asymptotic 7. This is done imagining a 
surrogate sequence with the same frequency of "e" com- 
posed by prince and randomly added l's. Using the 



fitting values of 5,7 for prince in Eq. (17) we obtain 
t? > 6 10 5 , which is larger than the maximum time t s 
used to obtain 7. Conversely, for a sequence with the 
same frequency of " prince " built as a random sequence 
on top of e we obtain tx > 7 10 s . These calculations 
not only explain j c < 7 pr j ncc , they show that prince 
is a particularly meaningful (not random) sequence on 
top of e, and that e is necessarily composed by other se- 
quences with 1 < 7 < 7 pr ince that dominate for shorter 
times. More generally, the observation of long-range cor- 
relations at low levels is due to widespread correlations 
on higher levels. 

2. The sharper transition for keywords. The addition of 
many sequences with 7 > 1 explains the slow increase in 
7(i) for letters because sequences with increasingly larger 
7 dominate for increasingly longer times. The same rea- 
soning explains the positive correlation between ■% and 



6 



the length of the book (Pearson Correlation r = 0.44, 
similar results for other letters). The sequence so also 
shows slow transition and small 7, consistent with the in- 
terpretation that it is connected to many topics on upper 
levels. In contrast, the sharp transition for prince indi- 
cates the existence of fewer independent contributions on 
higher levels, consistent with the observation of the onset 
of burstiness ct t /(t) > 1. Altogether, this strongly sup- 
ports our model of hierarchy of levels with keywords (but 
not function words) strongly connected to specific topics 
which are the actual correlation carriers. The sharp tran- 
sition for the keywords appears systematically roughly at 
the scale of a paragraph (10 2 — 10 3 symbols), in agree- 
ment with similar observation in Refs. [21 [2Ql [221 136] . 



C. Data analysis of shuffled texts 

Additional insights on long-range correlations are ob- 
tained by investigating whether they are robust under 
different manipulations of the text [21 E]. Here we fo- 
cus on two non-trivial shuffling methods (see Si-Sec. VII 
for simpler cases for which our theory leads to analytic 
results). Consider generating new same-length texts by 
applying to the original texts the following procedures 

Ml Keep the position of all blank spaces fixed and place 
each word-token randomly in a gap of the size of 
the word. 

M2 Recode each word-type by an equal length random 
sequence of letters and replace consistently all its 
tokens. 

Note that Ml preserve structures (e.g., words and letter 
frequencies) destroyed by M2. In terms of our hierarchy, 
Ml destroys the links to levels above word level while 
M2 shuffles the links from word- to letter-levels. Since 
according to our picture correlations originate from high 
level structures, we predict that Ml destroys and M2 
preserves long-range correlations. Indeed simulations un- 
equivocally show that long-range correlations present in 
the original texts (average 7 of letters in wrnpc 1.40±0.09 
and in all books 1.26 ± 0.11) are mostly destroyed by 
Ml (1.10 ± 0.08 and 1.07 ± 0.08) and preserved by M2 
(1.33 ± 0.08 and 1.20 ± 0.09 (see SI-Tables for all data). 
At this point it is interesting to draw a connection to 
the principle of the arbitrariness of the sign, according 
to which the association between a given sign (e.g., a 
word) and the referent (e.g., the object in the real world) 
is arbitrary [37]. As confirmed by the M2 shuffling, the 
long-range correlations of literary texts are invariant un- 
der this principle because they are connected to the se- 
mantic of the text. Our theory is consistent with this 
principle. 
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FIG. 4: Transition from normal to anomalous behav- 
ior. The time dependent exponent is computed as 7(4) = 
A log o\ (t) / A log t (local derivative of the transport curve in 
Fig. [2j}d). Results for three sequences in wrnpc are shown 
(from top to bottom): the noun " prince ", the most frequent 
letter "e", and the word " so " (same frequency of" prince 
"). The horizontal lines indicate the 7, the error bars, and 
the fitting range. Inset (from top to bottom): the 4 other 
nouns appearing as outliers in Fig. [3] the 4 most frequent let- 
ters after "e" , and the 4 words matching the frequency of the 
outlier-nouns. 



III. DISCUSSION 

From an information theory viewpoint, long-range cor- 
relations in a symbolic sequence have two different and 
concurrent sources: the broad distribution of the dis- 
tances between successive occurrences of the same sym- 
bol (burstiness) and the correlations of these distances. 
We found that the contribution of these two sources is 
very different for observables of a literary text at different 
linguist levels. In particular, our theoretical framework 
provides a robust mechanism explaining our extensive 
observations that on relevant semantic levels the text is 
high-dimensional and bursty while on lower levels succes- 
sive projections destroy burstiness while preserving the 
long-range correlations of the encoded text via a flow of 
information from burstiness to correlations. 

The mechanism explaining how correlations cascade 
from high- to low-levels is generic and extends to lev- 
els higher than word-level in the hierarchy in Fig. [T] 
The construction of such levels could be based, e.g., on 
techniques devised to extract information on a "concept 
space" [21 [221 13S]- While long-range correlations have 
been observed at the concept level [2] , further studies are 
required to connect to observations made at lower levels 
and to distinguish between the two sources of correla- 
tions. Our results showing that correlation is preserved 
after random additions/subtractions of l's help this con- 
nection because they show that words can be linked to 
concepts even if they are not used every single time the 
concept appears (a high probability suffices). For in- 
stance, in Ref. [2] a topic can be associated to an axis 
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of the concept space and be linked to the words used 
to build it. In this case, when the text is referring to 
a topic there is a higher probability of using the words 
linked to it and therefore our results show that corre- 
lations will flow from the topic to the word level. In 
further higher levels, it is insightful to consider as a limit 
picture the renewal case - Eq. (|8| - for which long-range 
correlations originate only due to burstiness. This limit 
case is the simplest toy model compatible with our re- 
sults. Our theory predicts that correlations take form of 
a bursty sequence of events once we approach the seman- 
tically relevant topics of the text. Our observations show 
that some highly topical words already show long-range 
correlations mostly due to burstiness, as expected by ob- 
serving that topical words are connected to less concepts 
than function words |35) . This renewal limit case is the 
desired outcome of successful analysis of anomalous dif- 
fusion in dynamical systems and has been speculated to 
appear in various fields []j|l [32] ■ Using this limit case as 
a guideline we can think of an algorithm able to auto- 
matically detect the relevant structures in the hierarchy 
by pushing recursively the long-range correlations into a 
renewal sequence. 

Next we discuss how our results improve previous anal- 
yses and open new possibilities of applications. Previous 
methods either worked below the letter level [TJ |2"5H2"T] 
or combined the correlations of different letters in such a 
way that asymptotically the most long-range correlated 
sequence dominates [6j HI [TT] . Only through our results 
it is possible to understand that indeed a single asymp- 
totic exponent 7 should be expected in all these cases. 
However, and more importantly, 7 is usually beyond ob- 
servational range and an interesting range of finite-time 7 
is obtained depending on the observable or encoding. On 
the letter level, our analysis (Figs. [2] and [3| revealed that 
all of them are long-range correlated with no burstiness 
(exponentially distributed inter-event times). This lack 
of burstiness can be wrongly interpreted as an indication 
that letters [33J and most parts of speech [35] are well de- 
scribed by a Poisson processes. Our results explain that 



the non-Poissonian (and thus information rich) charac- 
ter of the text is preserved in the form of long-range 
correlations (7 > 1), which is observed also for all fre- 
quent words (even in the most frequent word " the "). 
These observations violate not only the strict assump- 
tion of a Poisson process, they are incompatible with 
any finite-state Markov chain model. These models are 
the basis for numerous applications of automatic seman- 
tic information extraction, such as keywords extraction, 
authorship attribution, plagiarism detection, and auto- 
matic summarization 12 15J . All these applications can 
potentially benefit from our deeper understanding of the 
mechanisms leading to long-range correlations in texts. 

Apart from these applications, more fundamental ex- 
tensions of our results should: (i) consider the mutual in- 
formation and similar entropy-related quantities, which 
have been widely used to quantify long-range correla- 
tions [BJ |2] (see [53] for a comparison to correlations); 
(ii) go beyond the simplest case of the two point au- 
tocorrelation function and consider multi-point correla- 
tions or higher order entropies [BJ, which are necessary 
for the complete characterization of the correlations of a 
sequence; and (iii) consider the effect of non-stationarity 
on higher levels, which could cascade to lower levels and 
affect correlations properties. Finally, we believe that our 
approach may help to understand long-range correlations 
in any complex system for which an hierarchy of levels 
can be identified, such as human activities [BJ and DNA 
sequences [5 HTT1 135] , 
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I. AVERAGE PROCEDURE IN BINARY 
SEQUENCES 

Given an ergodic and stationary stochastic process, 
correlation functions are defined as 

Corr(j, t) := E (x jXj+t ) - E( Xj )E(x j+t ). (9) 

where E(-) denotes an average over different realiza- 
tions x of the process. Stationarity guarantees that 
Corr(j, t) depends on the time lag t only. In practice, 
one typically has no access to different realizations of the 
process but only to a single finite sequence. In our case, 
any binary sequence x is obtained from a single text of 
length N through a given mapping. In such cases it is 
possible to use the assumption of ergodicity to approxi- 
mate the correlation function (|9| by 

C x (t) := (xjXj+t) - (x^ixj+t), 

where (•) means averaging, for each fixed i, over all pairs 
Xj and Xj+t for j = 1, 2, (N — t) as 

N-t 

W N-t ^ 



II. MAPPING EXAMPLES 

Consider the sentence "This paper is a paper of 
mine" . By choosing the condition a to be the k-th sym- 
bol is a vowel the projection f a maps the sentence 
into the sequence {00100010100100100101001000101}. 
If a is the k-th symbol is equal to 'e' than we get: 
{0000000010000000100001000000001}. Generally, we 
can treat any n-gram of letters in the same way, as 
for example by choosing the condition a to be the 
2-gram starting at the k-th symbol is equal to 'er\ 
that projects using a sliding window the sentence 
to:{00000000100000000001000000000}. Words are en- 
coded using their corresponding n-gram, for example 
a could be the 7-gram starting at the k-th symbol is 
equal to ' paper ' (blank spaces included) that gives: 
{0000100000000000010000000}. It is possible to general- 
ize these procedures to more semantic conditions a that 
associate 1 to either all or part of the symbols that ap- 
pears in a sentence that is attached to a specified topic. 
These topics can be quantitatively constructed from the 
frequency of words using methods such as latent semantic 
analysis pQ or the procedures to determine the so-called 
concept space [2]. 



III. SIMPLE OPERATIONS ON BINARY 
SEQUENCE AND THEIR EFFECTS ON 
LONG-RANGE CORRELATIONS AND 
BURSTINESS 

We describe two simple procedures to construct two 
binary sequences x and z such that x is on top of z. 
These procedures will be based either on the "addition" 
of l's to x or on the "subtraction" of l's of z. In the 
simplest cases of random addition and subtraction, we 
explicitly compute how long-range correlations flow from 
x to z (corresponding to a flow from upper to lower levels 
of the hierarchy) and how burstiness is preserved when 
extracting x from z ( moving from lower to upper levels 
in the hierarchy). 

Recall that a sequence x is on top of z if for all j such 
that Xj — 1 we have Zj +r = 1, for a fixed constant r. 
Without loss of generality in the following calculations 
we fix for simplicity r = 0. We now define simple opera- 
tions that map two binary sequences into a third binary 
sequence: 

• Given two generic binary sequences z and £ we de- 
fine their multiplication y=£ z as yi = £,iZi, Vi. By 
construction y is on top of z. 

• Given two non-overlapping sequences x and y we 
define their sum z = x+y as Zi = Xi + yi, Vi. By 
construction x and y are on top of z. We say that 
sequences x and y are non-overlapping if for all i 
for which Xi = 1 we have yi = 0. 

In general, two independent binary sequences x and 
£ will overlap. A sequence y which is non-overlapping 
with x can be constructed from £ as y= £(l-x), where 1 
denotes the trivial sequence with all l's. In this case, we 
say that z = x+y, with y =£(l-x) is a sequence lower 
than x in the hierarchy that is constructed by a random 
addition (of l's) to x. Similarly, if ( is independent of z, 
the sequence Ql is a random subtraction (of l's) of z 

A. Transition time from normal to anomalous 
diffusion 

Consider a sequence z constructed as a random addi- 
tion of l's to a given long-range correlated sequence x: 
z = x + y, with y =£(l-x) and £ a sequence of i.i.d. 
binary random variables. The associated random walker 
Z spreads anomalously with the same exponent of X. 
This asymptotic regime is masked at short times by a 
pre-asymptotic normal behavior. Here we first compute 
explicitly the spreading of Z in terms of that of X and 
Y and then we compute a bound for the transition time 
tT to the asymptotic anomalous diffusion of Z. 
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As written in Eq. (5) of the main text we have 

4 (*) = + 4 (*) + 2C(X(t), F(t)). (10) 
For our particular case we obtain 

(y(t)) = <0*(i-<*»- (ii) 



and 



<n*) 2 > = ( E 



(E(^ 2 )) + ( E 

\i=l / \i,3=l,i^j / 

I> 2 ><£ 2 > + E <^><o 2 
»=i »jj=i,*^j 

E<^ 2 )(e 2 )-E(^)(0 2 + E<* 2 )<£> 2 + E <^><o 2 

i— 1 i—1 2—1 i^j=X,i^j 



(12) 



From Eqs. |ll| and (12) - and noting that 5Z' =1 (:c 2 ) = Y^i=i( x i) = *(1 — ( x )) an( l = ~ we obtain 



a' Y (t) = (Y\t)) - {Y{t)y = (0'<4(t) " (*» 



The correlation term in Eq. ( 10 ) can also be obtained through direct calculations: 



C(X(t),Y(t)) = (X(t)Y(t))-(X(t))(Y(t)) 

= / E - ) - w (EC 1 - ^ 



\i=i 



r 



(13) 



(14) 



Finally, inserting Eqs. (|13|) and ( 14 ) into Eq. ( 10 ) we have 



f (t) = 4(t) + 4(i) + 2C(x(*),Y(i)) 



4W 

tof (1 • 

(0(1- 



<0)(i -<*»*- 



-<£» 2 
(i 



(15) 



As X superdiffuses so it will Z and they both have 
the same asymptotic behavior. On the other hand 
the asymptotic regime is masked at short times by a 
pre-asymptotic normal behavior, given by the linear 
term in t. We stress that, even if the non-overlapping 
condition for y forces both Oy{t) and C(X(t),Y(t)) 
to have the same asymptotic behavior of <j\{t), their 



cumulative contributions does not cancel out unless we 
trivially have (£) = 1. 

We now give a bound on the transition time tx to the 



asymptotic anomalous diffusion of Eq. ( 15 ). Without loss 



of generality consider the case in which even the asymp- 
totic anomalous behavior of X is masked by generic pre- 
asymptotic A(t) such that 

o%(jk) = {x)(l-{x))[{l-g)A(jk)+gt^] 

with < g < 1 and A(t) increasing and such that 
A(t)/V x — > for t — > oo (to guarantee that the asymp- 
totic behavior is dominated by t lx ) and A(l) = 1 
(as Ojf(l) = — (%)))■ The asymptotic behavior 



<xf (t) ~ t lx in Eq.(|15|) dominates only after a time tr 
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such that: 



IV. DATA 



(Qt T + (l-g)(x)(l-(Q)A(t T ) 
9(1 -{£)){x) 



(16) 



Using the fact that the term (1 - g){x)(l - (Q)A(t) is 
positive and that t 1 is monotonically increasing we finally 
have 



1 



(0 

i - (0 g(x) 



V(7x-1) 



(17) 



which corresponds to Eq. (7) of the main text. In 
practice, any finite-time estimate jx is close to the 
asymptotic only if the estimate is performed for 
t^> tr, otherwise A jx < 7x {jx = 1 if t -C tr)- 



In our investigations we considered the English ver- 
sion of the 10 popular novels listed in SI- Tab. Books. 
The texts were obtained through the Gutenberg project 
( jhttp : / / www . g ut enberg . org[) . We implement a very 
mild pre-processing of the text that reduces the number 
of different symbols and simplifies our analysis: we con- 
sider as valid symbols the letters "a-z", numbers "0-9", 
the apostrophe " ' " and the blank space " " . Capitaliza- 
tion, punctuations and other markers were removed. A 
string of symbols between two consecutive blank spaces is 
considered to be a word. No lemmatization was applied 
to them so that plurals and singular forms are considered 
to be different words. 



As noted in the main text, if z = x+y then x = z +y. 
Applying to this relation the same arguments above, sim- 
ilar pre-asymptotic normal diffusion and transition time 
appear in the case of random subtraction, moving up in 
the hierarchy. More specifically, starting from a sequence 
z such that asymptotically a%{t) ~ g(z) (1 — (z))t lz and 
constructing x = £z, with Q independent of z, we obtain 
a transition time tx for x given by: 



t T > t*n 



1 



(0 



1 



(0 9(1- (z)) 



V(7Z-1) 



(18) 



placing 



(1 



which corresponds to Eq. (17) above after properly re- 
(0 (1 - (0) and lx -> 7z . 



B. Random subtraction preserves burstiness 



We consider the case of sequences as in Eq. (8) of 
the main text: z is a sequence emerging from a renewal 
process with algebraically decaying inter-event times, i.e. 
p(r) = and C T (k) — 5(k). Given now a fixed 

< (£) < 1, we consider the random subtraction x = £z 
where each Zj — 1 is eventually set to Zj = with prob- 
ability (£). It is easy to see that the inter-event times of 
the new process will be distributed as: 

oo k 

p(r) = (1 - (0)p(r) + ]T((0) fc £ 11* 



k>l 



ti+t 2 +- -+t*=T j=l 



Asymptotically p(r) is dominated by the long tails of 
(1 — (£))p(t): given a large r, fix k > eventually di- 
verging with t — > oo and split accordingly the sum over k 
in the second term of the right hand side. The term cor- 
responding to the sum k > k is exponentially dominated 
by t; k and arbitrary small, while the remaining finite sum 
over k < k is controlled again by the tail of p(r) . 



V. CONFIDENCE INTERVAL FOR 
DETERMINING LONG-RANGE CORRELATION 

As described in the main text, the distinction be- 
tween long-range and short-range correlation requires a 
finite-time estimate 7 of the asymptotic diffusion expo- 
nent 7 of the random-walkers associated to a binary se- 
quence. In practice, this corresponds to estimate the 
tails of the a 1 ~ t 1 relation and it is therefore essen- 
tial to estimate the upper limit in t, denoted as t s , for 
which we have enough accuracy to provide a reasonable 
estimate 7. We adopt the following procedure to esti- 
mate t s . We consider a surrogate binary sequence with 
the same length N and fraction of symbols (l's), but 
with the symbols randomly placed in the sequence. For 
this sequence we know that 7=1. We then consider 
instants of time i, equally spaced in a logarithmic scale 
of t (in practice we consider ti+i/ti = 1.2, with i integer 
and to = 1). We then estimate the local exponent as 
7iocai(t t ) = [log 10 A<j 2 (t l+1 ) - log 10 Acr 2 (^)]/log 10 (l-2). 
For small t, 7i OC ai = 1 but for larger t statistical fluc- 
tuations arise due to the finiteness of N, as illustrated 
in Fig. |S2[a). We choose t s as the smallest t, for 
which {Tiocal (tj+i ) , 7local (tj+2 ) , 7local (k+3 ) } are all out- 
side [0.9,1.1] (see Fig. S2i). We recall that our primary 
interest in the distinction between 7 = 1 and 7^1. 
The procedure described above is particularly suited for 
this distinction and an exponent 7 > 1.1 obtained for 
large t ^ t s can be confidently regarded as a signature 
of super diffusion (long-range correlation 



In Fig. S2 



we 

verify that t s show no strong dependence on the fraction 
of l's in the binary sequence (inset) and that it scales 
linearly with N. Based on these results, a good estimate 
of t s is t s — A/100, i.e. the safe interval for determining 
long-range correlation ends two decades before the size 
of the text. This phenomenological rule was adopted in 
the estimate of 7 for all cases. The t s is only the upper 
limit and the estimate 7 is performed through a least- 
squared fit in the time interval t s > < t < t s = A/100, 
where t s i ~ i s /100. In practice, we select 10 different 
values of U around i s /100 and report the mean and vari- 
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ance over the different fittings as 7 and its uncertainty, 
respectively. 



VI. LOWER BOUND FOR 7 DUE TO 
BURSTINESS 



We start clarifying the validity of the inequality 

7 > 7A2, (19) 

where 7 is the finite-time estimate of the total long-range 
correlation 7 of a binary sequence x and 7,42 is the es- 
timate for the correlation due to the burstiness (which 
can be quantified by shuffling x using the procedure A2 
of the main text). Equation (4) of the main text shows 
that both burstiness a T /(r) — » 00 and long-range correla- 
tions in the sequence of 's contribute to the long-range 
correlations of a binary sequence x. While the <t t /(t) 
contribution is always positive, the contribution from the 
correlation in Tj's can be positive or negative. In prin- 
ciple, a negative contribution could precisely cancel the 



00, T m in is a lower cut-off (we fixed it at r„ 



10), 



contribution of <j t /(t) and violates the inequality (19) 



Conversely, this inequality is guaranteed to hold if the 
asymptotic contribution of the correlation in r^'s of x to 
a\ is positive. We now show that this is the case for 
the sequences we have argued to provide a good account 
of our observations. Consider high in the hierarchy a 
renewal sequence x with a given 7 > 1 and broad tail 
in p(r) (diverging <j t /(t)). Adding many independent 
non-overlapping sequences, we construct a lower level se- 
quence that still has long range correlation, with the same 
exponent 7 (see Sec. Ill above). For this sequence we 
know that the broad tail in p(r) has a cutoff r c and thus 
burstiness gives no contribution to 7. Instead, 7 > 1 
results solely from the correlations in the r's, which are 
therefore necessarily positive. It is natural to expect that 
this positiveness of the asymptotic correlation extends to 



finite times, in which case the (finite time) inequality ( 19 ) 
holds. Indeed, for small t < r c , the distribution p(rjis 
not strongly affected by the independent additions and 
thus for t < t c a finite time estimate 7 will receive con- 
tributions from both burstiness and t's correlations. Fi- 



nally, we have directly tested the validity of Eq. ( 19 ) by 



comparing 7 of different sequences x to the 7^2 obtained 
from the corresponding x^2 (A2-shuffled sequences of x, 
see main text). The inequality (19) was confirmed for 



every single sequence we have analyzed, as shown by the 
fact that 7A2 (red symbols) in Fig. [S3] are systematically 
below their corresponding jx (black circles). 

We now obtain a quantitative lower bound for 7 using 
Eq. (19). We consider a renewal sequence (in which case 
7 = 7A2) with an inter-event time distribution given by 



and C is a normalization constant. We obtain the lower 
bound for 7 as a function of <t t /(t) by considering how 
7A2 and a r j (r) change with t c in the model above. For 
short times (t << t c ) the corresponding walkers have not 
seen the cutoff and their diffusion will be anomalous with 
exponent 7^42 = JA2- At longer time (t » r c [3H5]) the 
diffusion becomes normal 742 = 1- Correspondingly, if 
the fitting interval t S [i^, , t s ] used to compute the finite 
time 742 (see Sec. [V]) is all below r c (i.e. t s < r c ) we 
have "fA2 — 1A2 while if the fitting interval is all beyond 
the cutoff (i.e. r c < t s >) we have ; jA2 = 1- When t c is 
inside the fitting interval we approximate 7^2 by linearly 
interpolating between 7^42 and 1. Finally, we can com- 
pute cr T /(r) by directly calculating the first and second 
moments of the distribution ( |20[ ). Particularly impor- 
tant are the values si and S2 obtained evaluating a T /(r) 
at the critical values of the cutoff r c = t s > and r c = t s , 
respectively. Using the fact that o>/(t) is a monotonic 
increasing function of r c we can obtain explicitely the 7 
dependency on a r j (t). The %42 for the case of a binary 
sequence with distribution ( 20 1 is given by 



1A2 = 1 

1A2 = (PV/M - fll) ff a _„ , 
1A2 = 1A2 



if 0>/(t) < si, 
if Ct/(t) e [si, s 2 ], 
if o>/(t) > s 2 . 



p(r) = Ct 



-(4-7A2), 



T > T„ 



(20) 



The red dashed line in Fig. S3 (Fig. 3 of the main text) 
was computed using the fitting range corresponding to 
the book wrnpc t s > = 3 10 2 ,t s = 3 10 4 (see Sec. [v|, and 
1A2 = 1-6 (compatible with 7 observed for words with 
large <j t /(t)). 



VII. ADDITIONAL SHUFFLING METHODS 

In addition to the shuffling methods presented in the 
main text, we discuss here briefly two cases: 

• Shuffle words 

Mixing words order kills correlations for scales 
larger than the maximum word length [5J [7] . Even 
the blank space sequence B becomes uncorrclatcd 
because its original correlations originate (as in the 
case of all letters) from the correlation in n and not 
from tails in p(j). 

• Keep all blank spaces in their original positions 
and fill the empty space between them with: 

1- two letters a, b, placed randomly with proba- 
bilities p a = p and pb = 1 — p. 

2- the same letters of the book, placed in random 
positions. 



where r c is the cut-off time, 7^2 is the anomalous dif- 
fusion exponent for a renewal sequence with no cutoff 



By construction, correlation for blank space is 
trivially preserved. What do we expect for the 
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other letters? The following simple reasoning in- 
dicates that long-range correlation should be ex- 
pected asymptotically in both cases: any letter se- 
quence x is on top of the rev erted blank space se- 
quence B; the results in Sec. 



Ill 



above show that 
either the selected sequence x or its compleme nt y 
(such that x + y = B) has 7 = 7s; and Eq. ( |15| ) 
above shows that any randomly chosen x on top 
of B has 7 = 7b. In practice these exponents are 
relevant only if the subsequence is dense enough in 
order for in Eq. ( 18 ) above to be inside the ob- 



servation range. For the first shuffling method and 
for our longest book (wrnpc), we obtain that only 
if p > 95.8% one finds < t s = 1% book size. 



Since the most frequent letter in a book has much 
smaller frequency (around 10%), we conclude that 
in practice all sequences obtained using the sec- 
ond shuffling mehthod have 7 = 1 for all books of 
size smaller than 100 x tr rs 10 11 symbols (« 10 7 
pages). 



These simple calculations show that jb > 1 does 
not explain the correlations observed in the let- 
ters of the original text, as has been speculated 
in Ref. [8 . Their origin are the long-range correla- 
tions on higher levels. 
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Fig. SI: Long-range correlation in texts encoded as vowels. Upper plot: detailed analysis in the book wrnpc with exponent 7 = 
1.55 ± 0.05 (wrnpc). Lower plots: analysis of the remaining 9 books with the following exponents 7: 1.55 ± 0.05 (wrnpc), 
1.18 ±0.05 (alice), 1.23 ±0.04 (sawyer),' 1.20 ± 0.03 (pride) 1.48 ±0.05 (missisipi), 1.26 ±0.05 (jungle), 1.25 ± 0.04 (beagle), 
1.45 ±0.05 (moby), 1.61 ± 0.06 (ulysses), 1.26 ± 0.04 (quixote) 
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Fig. S2: (Color online) Determination of the time interval for the estimate of the long-range correlation exponent 7. (a) 
The dispersion Act 2 as a function of time t is shown as • for a random binary sequence of size N = 10 6 and 10% of l's. The 
local derivative is shown as ■ and agrees with the theoretical exponent 7 = 1 until fluctuations start for long t (axis on the 
right). The time t 3 denotes the end of the interval of safe determination of 7, as explained in the text, (b) Dependence of t s 
on the size of the binary sequence N. The boxplots show the 5%, 25%, 50% (median), 75%, and 95% quantiles over M different 
realizations of a random binary sequence. Black boxplots: M = 300 (M = 44 for N = 10 7 ) realizations equally divided between 
frequency= 1%, 10%, 50%. Blue boxplots: M = 35 realizations equally divided between the frequencies of the three most 
frequent letters ("_","e", "t") and two most frequent words ( "_the_" , "_and_" ) of the shortest (Alice, N = 143,488) and longest 
(War and Peace, N = 3, 147, 284) books Inset: boxplots of t s for different frequencies and fixed sequence length A^ = 10 6 and 
M — 100, showing no strong dependence on frequency. 
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1 2 3 4 5 
burstiness - O /<T> 

T 



Fig. S3: This figure corresponds to Fig. 3 of the main paper with the addition (red squares) of the estimated 7 for sequences 
XA2 obtained shuffling each one of the original sequences. The shuffling does not change the a T /{r) and therefore the original 
and shuffled sequences appear always on the same vertical line. The fact that the results for xa2 are systematically below their 
corresponding x is a strong evidence of the validity of the inequality 1 19 1. 
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Table S2: Correlation 7 and burstiness o T /(r) obtained for 
the diferrent binary sequences in the indicated book. 
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Table S3: Correlation 7 and burstiness o T /(r) obtained for 
the diferrent binary sequences in the indicated book. 
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0.011 


1 

1 


of; 


0.03 


















348 


1.534 


0.015 


1 

1 


90 
zy 


0.04 
















much 


338 


1.112 


0.010 


I 


08 


0.04 
















conntrv 


337 


1.519 


0.011 


1 


28 


0.05 
















_land_ 


318 


1.387 


0.012 


1 


25 


0.04 
















_must_ 


317 


1.290 


0.009 


1 


12 


0.04 
















_feet_ 


312 


1.391 


0.013 


1 


23 


0.04 
















_may_ 


311 


1.118 


0.010 


1 


09 


0.03 
















_species_ 


303 


2.459 


0.022 


1 


55 


0.05 
















_found_ 


303 


1.217 


0.010 


1 


16 


0.04 
















_me_ 


301 


1.206 


0.012 


1 


07 


0.04 
















_day_ 


301 


1.375 


0.007 


1 


12 


0.03 

















20 



Table S4: Correlation 7 and burstiness o t /(t) obtained for 
the diferrent binary sequences in the indicated book. 



Book: jungle; N=783,014 



Original data 



^pmipnpp 




a /(t) 

u r/\i/ 


error 


1 


error 


1 


error 


7 


error 


VUWClO 


917050 


490 


090 

VJ.VJ^jVJ 


1 
x 


9fi 
zu 


0^ 

VJ.VJ<J 


















1 51 300 


0.404 


0.010 


1 
1 


51 
00 


0.05 


1 




0.05 


1 
1 


5"? 


n 
u 


05 
uu 




78161 


0.843 


0.002 


1 

X 




0.04 


1 

L 


HQ 


0.04 


1 

L 


04 




u 


04 

U4: 


t 


58475 


0.873 


0.002 


1 
X 


19 
oz 


0.05 


1 

1 


1 Q 


0.04 


1 
X 


04 
u^ 


n 
u 


01 

UO 


cl 


U*J\J\JtJ 


0.854 


0.002 


1 
1 


91 

Z X 


0.04 


1 

1 


1 7 


0.04 


1 
1 


OS 
uo 


n 
u 


01 
uo 


Q 


47796 


896 


001 

VJ.VJVJJ. 


1 
1 


1 8 


04 

VJ.VJ4: 


1 

1 


94 

Z4: 


04 

VJ.VJ4: 


n 

u 


Q7 
y 1 


n 
u 


04 
u^r 


n 


44497 


0.874 


0.002 


1 
1 


1 4 


0.04 


1 

1 


1 s 

10 


0.04 


1 

1 


1 9 
xz 


n 
u 


01 
uo 


h 


44473 


0.832 


0.002 


1 
1 


17 
( 


0.05 


1 

1 


97 

Z ( 


0.04 


1 

1 


1 7 


n 
u 


05 
uu 




40095 

4:VJVJZirJ 


906 

VJ . Zj vjvj 


001 


1 
1 


14 
o^r 


05 

VJ.VJ<J 


1 

1 


91 
zo 


05 

VJ.VJrJ 


1 

1 


1 1 


n 
u 


04 
u^r 


g 


37500 


0.941 


0.001 


1 
1 


19 
oz 


0.05 


1 

1 


17 


0.04 


1 

1 


07 


n 
u 


01 
uo 




3451 4 


888 

VJ .000 


009 

VJ •\J\JZj 


1 
1 


1 Q 

iy 


04 

VJ.VJ4: 


1 

1 


1 Q 


04 

VJ.VJ4: 


1 

1 


OQ 

uy 


n 
u 


04 

u^r 


d 


30491 


0.929 


0.001 


1 
1 


11 


0.06 


1 

1 


99 
zz 


0.04 


1 

1 


00 
uu 


n 
u 


04 

u^r 


1 


94876 

ZjIO 1 VJ 


1 059 

-L .VJfJt7 


001 

VJ.VJVJJ. 


1 
X 


90 


04 

U.VJ4: 


1 

1 


1 


04 


1 

1 


07 
u / 


n 
u 


01 
uo 


u 


17475 


0.943 


0.001 


1 
1 


1 fi 


0.04 


1 

1 


99 
zz 


0.04 


n 

u 


yy 


n 
u 


05 
uu 


w 


17213 


0.948 


0.002 


1 
L 


Ifi 
ou 


0.06 


1 

L 


10 
ou 


0.05 


1 

L 


Ofi 
uu 


n 

u 


01 
uo 


ni 


14754 


0.977 


0.001 


1 
1 


1 8 
10 


0.04 


1 

1 


94 

Z4: 


0.04 


1 
1 


04 

U4: 


n 
u 


01 
uo 




14148 


1.006 


0.001 


1 
1 


17 


0.05 


1 

1 


1 s 

10 


0.04 


1 
1 


1 
1U 


n 
u 


04 
u^r 


c 

& 


1 406Q 

i4uuy 


994 


009 

VJ.VJVJZ/ 


1 
1 


98 


06 

U.VJVJ 


1 
1 


97 

Z ( 


04 

VJ.VJ4: 


1 
1 


of; 
uu 


n 
u 


01 
uo 


f 


13862 


1.016 


0.002 


1 
X 


95 
zo 


0.04 


1 

1 


91 
z 1 


0.04 


1 
1 


OQ 

uy 


n 
u 


04 
u^r 


V 

J 


10868 


1.068 


0.002 


1 

J. 


9Q 

zy 


0.05 


1 

L 


1 fi 
1 u 


0.04 


n 

u 


Q5 

y 




u 


05 
uu 


n 
P 


9940 


1.074 


0.002 


1 
X 


9/L 

Z4: 


0.05 


1 

1 


94 

Z4: 


0.04 


1 

1 


OR 
uu 


n 
u 


04 
u^r 


the 


8930 


1.018 


0.003 


1 
X 


14 

04: 


0.04 
















O VI H 


7980 


958 

VJ .iJOO 


009 

VJ.lJVJZ/ 


1 
X 


97 

Z 1 


04 

VJ.VJ4: 
















_of_ 


4365 


1.113 


0.003 


1 

J. 


49 

4:Z 


0.07 
















_to_ 


4190 


1.077 


0.003 


1 

1 


90 
zu 


0.04 
















a 


4158 


1.152 


0.004 


1 

1 


99 
zz 


0.04 
















_he_ 


3311 


2.158 


0.011 


1 

_L 


fiO 


0.05 
















him 


1184 


2.009 


0.013 


1 

1 


49 

4:Z 


0.05 
















ii lrci Q 
-J ul 6 1B - 


1098 


2.077 


0.010 


1 

X 


48 


0.07 


















485 


6.141 


975 

VJ .£> 1 


1 

X 


54 


06 

VJ.VJVJ 
















man 


463 


1.301 


0.013 


1 

1 


97 

Z 1 


0.04 
















said 


367 


1.975 


0.019 


1 


38 


0.04 
















time 


356 


1.209 


0.013 


1 


15 


0.04 
















_men_ 


329 


1.768 


0.011 


1 


33 


0.05 
















_now_ 


325 


1.077 


0.009 


1 


11 


0.03 
















-day_ 


280 


1.378 


0.021 


1 


15 


0.04 
















_other_ 


279 


1.244 


0.014 


1 


16 


0.04 
















_place_ 


263 


1.227 


0.013 


1 


17 


0.04 
















_only_ 


261 


1.042 


0.010 


1 


03 


0.04 
















_before_ 


235 


1.117 


0.010 


1 


09 


0.03 
















_home_ 


229 


1.759 


0.012 


1 


23 


0.04 
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Table S5: Correlation 7 and burstiness o T /(r) obtained for 
the diferrent binary sequences in the indicated book. 



Book: missisipi; N=772,391 



Original data 



^pmipnpp 




a /(t) 

u r/\i/ 


error 


1 


error 


1 


error 


7 


error 


VUWClO 


935370 


445 

VJ .4:4:<J 


090 

VJ.VJZvVJ 


1 
1 


48 
40 


0^ 


















146786 


0.429 


0.009 


1 

X 


UJ 


0.05 


1 




0.05 


1 
1 


fi^ 


n 
u 


O^i 
uo 




76483 


0.850 


0.002 


1 

L 


40 

4U 


0.05 


1 

X 


1 1 

X X 


0.04 


1 

X 


1 n 


n 
u 


03 
uo 


t 




0.858 


0.002 


1 
1 


94 


0.04 


1 

1 


1 8 
10 


0.04 


1 

X 


09 
uz 


n 
u 


04 

U4 


cl 


51642 


0.859 


0.002 


1 
X 


93 


0.04 


1 

1 


1 8 

10 


0.04 


1 

X 


1 1 

X X 


n 
u 


04 

U4 


Q 


471 93 


890 

VJ .OfU 


009 

VJ .VJVJZj 


1 
1 


93 
zo 


04 


1 

1 


99 
zz 


04 

VJ.VJ4: 


1 

X 


0^ 


n 
u 


04 

U4 


n 


44064 


0.869 


0.002 


1 
1 


94 


0.04 


1 

1 


94 

Z4 


0.04 


1 

X 


08 
uo 


n 
u 


03 
uo 


j 


42750 


0.920 


0.001 


1 
1 


Qn 
ou 


0.04 


1 

1 


1 Q 


0.04 


1 

X 


1 1 

X X 


n 
u 


03 
uo 


G 



38QQ5 


940 


001 

VJ.VJVJ J. 


1 
1 


34 
04 


06 

U.VJU 


1 

1 


93 
zo 


04 

VJ.VJ4: 


1 

X 


90 
zu 


n 
u 


04 

U4 


h 


36904 


0.859 


0.002 


1 
1 


40 
4U 


0.05 


1 

1 


1 3 
10 


0.04 


1 

X 


90 
zu 


n 
u 


04 

U4 






91 9 

VJ . £7 J_Zj 


001 

VJ.VJVJ J_ 


1 
1 


34 

04 


05 
yj.yjo 


1 

1 


1 Q 


04 

VJ.VJ4: 


1 

X 


1 fi 

1U 


n 
u 


04 

U4 


d 


27682 


0.974 


0.001 


1 
1 


40 
4U 


0.06 


1 

1 


94 

Z4 


0.04 


1 

X 


0^ 

UJ 


n 
u 


03 
uo 


1 


24910 


1 055 

J- .XJOO 


001 

VJ.UVJJ. 


1 
1 


90 


04 


1 

1 


1 9 
1 z 


04 

VJ.VJ4: 


1 

X 


Ofi 
uu 


n 
u 


03 
uo 


LI 


17372 


0.947 


0.002 


1 
1 


90 


0.04 


1 

1 


1 7 


0.04 


1 

X 


07 
u / 


n 
u 


03 
uo 


w 


15554 


0.996 


0.002 


1 
L 


30 


0.04 


1 

X 


9^ 
zo 


0.04 


1 

X 


1 
xu 


n 

u 


03 
uo 


ni 


14940 


1.006 


0.002 


1 

_L 


9Q 

zc* 


0.04 


1 

X 


97 

Z 1 


0.04 


1 

X 


OQ 


n 
u 


04 

U4 




14884 


1.042 


0.001 


1 
1 


3^, 


0.05 


1 

1 


91 
z 1 


0.04 


1 

X 


91 
z X 


n 
u 


Ofi 
uu 


f 


14234 


1 006 

J. .VJVJU 


001 

VJ.VJVJ _L 


1 
1 


94 


05 

VJ.VJtJ 


1 
1 


1 4 


04 

VJ.VJ4: 


1 

X 


04 

U4 


n 
u 


03 
uo 


c 
& 


12890 


1.044 


0.001 


1 
X 


9fi 
zu 


0.04 


1 
X 


1 7 


0.04 


1 

X 


OQ 


n 
u 


03 
uo 


v 
J 


11994 


1.022 


0.002 


1 
L 


34 


0.04 


1 

X 


1 4 

X4 


0.04 


1 

X 


09 
uz 


n 
u 


03 
uo 


n 
P 


11087 


1.093 


0.002 


1 
X 


30 


0.05 


1 

1 


1 7 


0.04 


1 

X 


1 fi 
xu 


n 
u 


nc; 
uo 


the 


9091 


1.043 


0.003 


1 
X 


38 
00 


0.04 
















O ri A 

CHILI 


5898 


995 


003 

VJ.VJVJO 


1 
X 


34 
04 


0^ 

U.UO 
















of 

_U1_ 


4380 


1 033 


003 

VJ.VJVJO 


1 

X 


39 
oz 


05 

U.VJtJ 
















Q 
-Oi_ 


4057 


1 098 

J. .VJc70 


003 


1 

L 


99 
zz 


04 

VJ.VJ4: 
















_to_ 


3545 


1.095 


0.004 


1 
X 


94 

Z4 


0.04 
















in 


2555 


1.031 


0.004 


1 
1 


1 4 


0.04 
















wni 1 Id 


480 


1 559 

J_ . rJfJZj 


01 9 

VJ.VJJ.Zj 


1 
1 


9fi 
zu 


04 

U.VJ4: 
















river 


478 


2.176 


0.014 


1 
X 


43 

4:0 


0.06 
















_ W CLbCJ- — 


242 


1.899 


0.015 


1 
X 


38 
00 


0.05 
















she 


239 


2.055 


0.022 


1 
X 


44 

44 


0.06 
















boat 


212 


1.921 


0.028 


1 


32 


0.05 
















here 


210 


1.508 


0.015 


1 


24 


0.04 
















_night_ 


177 


1.609 


0.012 


1 


30 


0.05 
















_can_ 


177 


1.392 


0.015 


1 


13 


0.04 
















_go_ 


176 


1.275 


0.010 


1 


16 


0.04 
















_licad_ 


175 


1.612 


0.017 


1 


41 


0.06 
















_pilot_ 


172 


2.652 


0.047 


1 


40 


0.05 
















_long_ 


172 


1.246 


0.013 


1 


06 


0.03 
















_hrst_ 


164 


1.132 


0.018 


1 


11 


0.04 
















_miles_ 


162 


1.816 


0.030 


1 


49 


0.05 
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Table S6: Correlation 7 and burstiness o T /(r) obtained for 
the diferrent binary sequences in the indicated book. 



Book: moby; N=l, 169,850 



Original data 





N- 


rr /M 


error 


1 


error 


1 


error 


7 


error 


~\TCW\T(* 1 Q 
VUWClO 


356037 

OOVJVJO 1 


0.441 


020 

VJ .VJ^VJ 


1 
1 


AX. 


05 

VJ.VJO 


















21 5Q3Q 


0.424 


009 

vj .vjvjty 


1 
1 


0^ 


05 

VJ.VJO 


1 


r i4 

o^x 


05 


1 

X 


^4 


n 
u 


uo 




1 16938 


0.859 


0.002 


1 
1 


9Q 


0.04 


1 


1 9 
1 z 


0.04 


1 

X 


0^ 
uo 


n 
u 


03 
uo 


t 


87882 


860 


002 

VJ .\JVJZj 


1 
x 


93 
zo 


04 

VJ.VJO 


1 


9^ 
zo 


04 


1 

X 


00 
uu 


n 
u 


03 
uo 


cl 


77820 


0.851 


0.002 


1 
1 


94 


0.04 


1 


99 
zz 


0.05 


1 

X 


O^i 
uo 


n 
u 


03 
uo 





69258 


900 

VJ . ZJVJVJ 


001 


1 
1 


97 

Z 1 


04 

VJ.VJ^± 


1 


1 6 
1U 


04 


1 

X 


OQ 

uy 


n 
u 


03 
uo 


n 




0.886 


0.001 


1 
1 


90 


0.04 


1 


90 
zu 


0.04 


1 

X 


07 
u / 


n 
u 


03 
uo 


j 


65349 


0.905 


0.001 


1 

X 


98 


0.04 


1 

X 


1 1 

X X 


0.04 


1 

X 


OQ 

uy 




u 


03 
uo 


g 


64148 


0.917 


0.001 


1 
1 


34 
o^x 


0.05 


1 

1 


31 

01 


0.04 


1 

X 


1 ^ 
xo 


n 
u 


04 
u^x 


h 


62824 


0.856 


0.002 


1 
1 


39 
oz 


0.04 


1 

1 


38 
00 


0.06 


1 

X 


91 
z X 


n 
u 


04 
u^x 




OAVJ i O 


900 

vj . zyvjvj 


002 

VJ .\J\JZj 


1 
X 


39 
oz 


04 


1 

1 


1 Q 


04 

VJ .U^ 


1 

X 


1 A 


n 
u 


04 

u^x 


1 


42733 


1.051 


0.001 


1 
X 


99 
zz 


0.04 


1 

1 


90 
zu 


0.04 


n 

u 


QQ 

yy 


n 
u 


03 
uo 


A 

KX 


381 Q9 


969 


001 


1 

1 


49 

4:Z 


05 

VJ.VJO 


1 

1 


1 Q 

iy 


04 


1 

X 


O^i 
uo 


n 
u 


03 
uo 


u 


26672 


0.968 


0.001 


1 

1 


93 
zo 


0.04 


1 

1 


OQ 
uy 


0.03 


1 

X 


09 
uz 


n 
u 


03 
uo 


m 


23243 


0.998 


0.001 


1 

X 


99 
zz 


0.04 


1 

X 


1 6 
x u 


0.04 


n 

u 


Q6 
y u 




u 


04 

U4: 


Q 


22482 


1.031 


0.001 


1 
X 


39 
oz 


0.04 


1 

X 


30 
ou 


0.04 


1 

X 


1 ^ 
xo 




u 


04 

U4: 


W 


22193 


0.957 


0.001 


1 
1 


9/L 

Z4: 


0.04 


1 

1 


93 
zo 


0.04 


1 

X 


06 
uu 


n 
u 


03 
uo 


f 


20812 


997 

VJ . Zj iJ t 


001 


1 
1 


33 


05 

VJ.VJO 


1 
1 


90 
zu 


04 

VJ .U^ 


1 

X 


01 
ux 


n 
u 


04 

u^x 


c 
6 


20801 


1.009 


0.001 


1 
1 


39 
oz 


0.04 


1 
1 


1 1 

X X 


0.03 


1 

X 


07 
u / 


n 
u 


04 

u^x 


1 1 

P 


17233 


1.057 


0.001 


1 
X 


93 
zo 


0.04 


1 
X 


1 3 

X 


0.04 


1 

X 


1 9 
xz 




u 


03 
uo 


V 

J 


16852 


1.037 


0.001 


1 
1 


9^, 

zo 


0.05 


1 

1 


99 
zz 


0.04 


n 
u 


Q8 

yo 


n 
u 


04 

u^x 


_the_ 


14404 


1.033 


0.002 


1 
1 


^xu 


0.04 
















of 


6600 


1 073 


003 

VJ .VJVJO 


1 
X 


47 
^x t 


06 

VJ.VJVJ 
















cLIld 


6428 


0.962 


0.002 


1 
X 


93 
zo 


0.04 
















S- 


4722 


1.137 


0.003 


1 
1 


34 

04; 


0.05 
















to 


4619 


1.023 


0.003 


1 
X 


1 ^ 


0.04 
















in 


4166 


1.021 


0.003 


1 
1 


30 
ou 


0.05 
















WT Tl £1 1 P 
_ W lldlC — 


1 0Q6 


2.162 


01 8 


1 
1 


^7 
1 


07 

VJ.VJ I 
















from 


1085 


1.143 


0.006 


1 
1 


1 ^ 
10 


0.04 


















476 

rt 1 VJ 


1.252 


007 


1 
X 


91 
z X 


04 

VJ.VJO 
















them 


474 


1.214 


0.012 


1 
X 


1 6 


0.04 


















453 


1.311 


0.009 


1 


24 


0.04 
















_old_ 


450 


1.507 


0.012 


1 


33 


0.04 
















_we_ 


445 


1.646 


0.011 


1 


28 


0.05 
















_ship_ 


438 


1.522 


0.012 


1 


31 


0.04 
















_ahab_ 


436 


3.056 


0.021 


1 


53 


0.06 
















-ye- 


431 


2.680 


0.018 


1 


43 


0.04 
















_who_ 


344 


1.136 


0.012 


1 


22 


0.04 
















_hcad_ 


342 


1.346 


0.012 


1 


35 


0.05 
















_timc_ 


333 


1.086 


0.014 


1 


08 


0.04 
















_long_ 


333 


1.092 


0.009 


1 


07 


0.03 
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Table S7: Correlation 7 and burstiness o T /(r) obtained for 
the diferrent binary sequences in the indicated book. 



Book: pride; N=659,408 







riginal data 


Shuffling Ml 


Shuffling M2 




AT. 


T 1 \ 1 


error 


I 


error 


I 


error 


I 


error 


VU Wclo 




0.437 


U. U*1U 


1 90 


04 

U .U'l 












122194 


0.450 


008 


1.41 


0^ 

U .UJ 


1.41 


05 

u. UU 


1.41 


05 
u.uu 




69370 


0.828 


009 

U. UUZi 


1 19 


04 

U.U4: 


1.12 


04 

U.U4: 


1 08 

1 .UO 


04 

U.U4: 


t 


46645 

4:UU4:U 


0.872 


009 


1 10 

1.1U 


05 

U .UJ 


1 09 

1 .Ui7 


04 

U.U4: 


1.02 


03 

U.UO 


CL 


41 688 


0.849 


009 

U. UUZi 


1.11 


03 

U .Uu 


1 18 

I.IO 


04 

U.U4: 


1.04 


04 

U.U4: 





40041 


0.891 


001 


1 1 8 
1.10 


04 

U .U'l 


1 OQ 

1 .Ui7 


0^ 
u.uo 


1 nn 


04 

U.U1 




37830 


0.870 


009 


1 16 

_1_ . 1 U 


04 

VJ .\J L ± 


1 ^1 
l.Ol 


04 

U.U4: 


1 09 

1 .Ui7 


04 

U.U4: 


n 


37689 


0.884 


001 

u. UU1 


1 1 3 

_1_ . _L O 


04 

VJ .\J L ± 


1 16 

1 . 1U 


04 

U.U4: 


1 09 

1 .Ui7 


03 

U.UO 


h 


34067 


0.869 


009 

U. UUZi 


1 31 


04 

VJ .U4 


1.04 


04 

U.U4: 


1 10 

1 . 1U 


03 

U.UO 


Q 


"HI 1 4 

OOl 14: 


0.956 


001 

U. UU1 


1 06 


03 

VJ .UO 


1.11 


03 

U.UO 


1 06 

1 .UU 


03 

U.UO 


r 




0.882 


001 


1 1 R 
1.10 


04 

VJ .U'rfc 


1 DQ 

1 .uy 


04 

U.U1 


1 Ofi 

1 .UU 


04 

U.U1 




99303 

ZiZ<OUO 


0.917 


009 


1 1 ^ 


04 


1.11 


0^ 
u.uo 


1 0^ 

1 .UU 


0^ 
u.uo 


1 

1 


91 ^Q4 


1.036 


001 


1 1 Q 


O 1 ^ 
VJ .uu 


1 1 n 

J — LU 


04 

U.U1 


1 0^ 
1 .uo 


04 

U.U1 




1 4987 


0.971 


009 


1.26 


04 

VJ . U^t 


1 1 7 

1 . 1 1 


04 

U.U4: 


1 0^ 
1 .uo 


0^ 
u.uo 


in 


14764 


0.963 


009 

U. UUZi 


1 1 7 

1.1 ( 


04 

VJ . U^t 


1 1 7 

1 . 1 1 


04 

U.U4: 


1.02 


03 

U.UO 




1 3461 


1.005 


009 

U. UUZi 


1 27 


06 

VJ .UU 


1.20 


04 

U.U4: 


1.04 


04 

U.U4: 


y 


1 9706 

_L 1 UU 


0.992 


009 

U. UUZi 


1 37 


05 

U .UU 


1.20 


05 

u. UU 


1.04 


03 

U.UO 


w 


1 9^0^ 

1 .ZOUU 


0.949 


009 


1 9^ 


04 

U .U'rfc 


1 14 

_L . ±4: 


04 

U.U1 


-L .UU 


04 

U.U1 


f 


1 1 QQS 


0.988 


009 
u. 


1.23 


04 

VJ . U^t 


1 05 

1 .UU 


0^ 
u.uo 


1 05 

1 .UU 


0^ 
u.uo 


c 
6 


10031 


0.949 


0.002 


1.06 


0.04 


1.17 


0.04 


1.03 


0.03 


K 
u 


J7UOO 


0.943 


009 

U. UUZi 


1 19 


05 

U .UU 


1 08 

1 .uo 


03 

U.UO 


99 


04 

U.U4: 


the 


4331 


1.083 


0.003 


1.24 


0.04 










_LU_ 




0.945 


u.uuo 


1 1 1 

± . J. J. 


u.uo 










of 

_U1_ 


3609 


0.974 


003 


1.21 


04 










_aild_ 




0.859 


00^ 
u.uuo 


1 1 8 
1.10 


04 

U .U'l 










licr 


2225 


1.592 


01 ^ 

u. u 1 u 


1 31 

l.Ol 


04 










i 


2068 


2.915 


0.014 


1.46 


0.05 










at 


1 00 


1.071 


u.uuu 


1 1 n 


n 04 

U.U4 










mr 


786 


1.218 


007 


1.32 


06 

VJ .VJU 












601 


1.459 


0.010 


1.26 


0.04 












597 


1.192 


097 

u. UZ( 1 


1 1 7 

1.11 


06 

u .UU 










Ar 
_U1 _ 


300 

OUU 


1.026 


01 

U.U1U 


1 00 

1 . UU 


03 










_bcnnet_ 


294 


2.047 


0.034 


1.37 


0.07 










_who_ 


284 


1.148 


0.010 


1.06 


0.03 










_miss_ 


283 


1.536 


0.015 


1.35 


0.07 










_one_ 


268 


1.066 


0.009 


1.06 


0.04 










_jane_ 


264 


1.741 


0.016 


1.29 


0.06 










_bingley_ 


257 


3.166 


0.019 


1.45 


0.08 










_we_ 


253 


1.546 


0.013 


1.26 


0.04 










_own_ 


183 


1.078 


0.015 


1.06 


0.04 










_lady_ 


183 


1.924 


0.023 


1.38 


0.06 











Table S8: Correlation 7 and burstiness o T /(r) obtained for 
the diferrent binary sequences in the indicated book. 



Book: quixote; N=2,080,431 







riginal data 


Shuffling 


; Ml 


Shuffling 


; M2 




Ni 


ov/M 


error 


7 


error 


7 


error 


1 


error 


TTY-yixrial 
VUWClO 


fi38889 


0.430 


090 

U.UZU 


1 
1 


9fi 
zu 


04 




















409964 


0.415 


009 


1 
1 


34 


O^S 


1 
1 




n 
u 


0^ 
uo 


1 

1 


■^4 


n 
u 


O^i 
uo 


p 




0.840 


009 


1 

_L 


9^1 
zu 


04 


1 
± 


zo 


n 


04 

U4: 


1 

L 


03 
uo 


n 

u 


03 
uo 


t 


1 ^71 93 


0.867 


009 


1 
1 


9fi 
zu 


04 


1 
1 


9 r i 
zo 


n 

u 


04 
u^ 


1 

1 


03 
uo 


n 

u 


03 
uo 


Q 

d 


1 38706 

-LOO 1 UU 


0.841 


009 


1 
± 


9^ 
zo 


04 


1 
1 


98 
zo 


n 

u 


04 
u^ 


1 

1 


OQ 
uy 


n 

u 


03 
uo 





1 3^41 


0.881 


001 


1 
1 


94 
z^ 


04 


1 
1 


9Q 
z y 


n 

u 


04 
u^ 


1 

1 


03 
uo 


n 

u 


03 
uo 


h 


117821 


0.852 


009 


1 
1 


93 
zo 


04 


1 
1 


1 


n 

u 


04 
u^t 


1 

1 


Ofi 


n 

u 


03 
uo 


IT 


1 1 5898 


0.866 


009 


1 

_L 


93 
zo 


04 


1 
± 


1 7 


n 


04 

U4: 


1 

± 


uo 


n 

u 


04 




112746 


0.881 


001 


1 
1 


9fi 
zu 


04 


1 
1 


1 Q 

ry 


n 

u 


04 
u^ 


1 

1 


04 
u^r 


n 

u 


03 
uo 


g 


1 0fiQ7Q 


0.935 


001 


1 
1 


98 
zo 


05 


1 

X 


97 

Z 1 


n 

u 


04 
u^ 


1 

1 


Of! 
uu 


n 

u 


03 
uo 


r 


Q9^01 


0.910 


001 


1 
1 


97 
z 1 


04 


1 

1 


98 
zo 


n 

u 


04 
u^ 


1 

1 


Ofi 
uu 


n 

u 


03 
uo 


A 

LI 




0.929 


001 


1 
1 


34 


04 


1 

1 


99 
zz 


n 

u 


04 
u^ 


1 

1 


03 
uo 


n 

u 


03 
uo 


1 

1 


fi91 07 


1.108 


009 


1 
± 


94 

Z^i 


04 


1 

1 


98 
zo 


n 

u 


O^i 
uo 


1 

1 


01 

Ul 


n 

u 


03 
Uo 


11 




0.949 


001 


1 
± 


93 
zo 


04 


1 

± 


1 7 


n 


04 

U4: 


1 

± 


00 

uu 


n 

u 


04 


111 


42945 


0.992 


0.001 


1 
± 


91 
z ± 


0.04 


1 

± 


zo 


n 


04 

U4: 


1 

L 


04 


n 

u 


O 1 ^ 
uo 


f 


38559 


0.977 


001 


1 
1 


94 
z^ 


04 


1 

1 


94 

Z4: 


n 

u 


04 
u^ 


1 

1 


uo 


n 

u 


03 
uo 


w 


38909 


0.986 


001 


1 
1 


9Q 
zy 


04 


1 

1 


91 

Z J_ 


n 

u 


04 
u^ 


1 

X 


uo 


n 

u 


03 
uo 


c 


37H09 


0.984 


001 


1 
1 


91 

z i_ 


04 


1 

1 


9fi 
zu 


n 

u 


04 
u^ 


1 

X 


08 
uo 


n 

u 


03 
Uo 


{T 

6 


31 Q97 


0.988 


001 


1 
± 


99 
zz 


04 


1 

1 


9^ 
zo 


n 

u 


04 
u^ 


1 

1 


03 
uo 


n 

u 


03 
Uo 


y 


31 053 


1.048 


001 


1 

_L 


9^ 
zo 


04 


1 

± 


9fi 
zu 


n 


04 

U4: 


1 

L 


0^ 
uo 


n 

u 


04 


P 


93880 


1.069 


001 


1 

_L 


93 
zo 


04 


1 


9fi 
zu 


n 


04 

U4: 


1 


1 1 


n 

u 


03 
uo 


the 


20652 


1.050 


0.002 


1 

_L 


3fi 
ou 


0.04 


















n A 


1 6835 


0.908 


009 


1 
1 


99 
zz 


05 


















to 

_IU_ 


1 31 84 

10104: 


1.031 


009 


1 
± 


9fi 
zu 


04 


















nf 


1 91 73 


1.033 


009 


1 
1 


9fi 
zu 


04 


















that 




1.023 


009 


1 
1 


91 

Z _L 


04 




















671 6 


1.023 


009 


1 

_L 


1 1 

_L _L 


03 


















hv 


9069 


1.042 


004 


1 
± 


OQ 

uy 


03 


















q ti n ( 1 n 


9063 


3.762 


095 


1 

_L 


fi3 
uo 


05 


















nr 
_U1 _ 


9048 


1.154 


004 


1 
1 


1 <i 
±0 


04 


















_L[LllAULL_ 


9009 


3.214 


01 


1 
1 


00 


Ofi 


















_U L11C1 _ 


609 


1.072 


008 




11 


04 


















Icnip'lit 

XV1±1 till L 


606 


2.175 


0.016 


1 


43 


0.05 


















_take_ 


546 


1.195 


0.008 


1 


14 


0.04 


















_master_ 


545 


1.720 


0.013 


1 


38 


0.04 


















_thy_ 


510 


2.252 


0.017 


1 


35 


0.05 


















_senor_ 


509 


1.632 


0.009 


1 


28 


0.04 


















_worship_ 


470 


2.337 


0.012 


1 


37 


0.04 


















_hcre_ 


467 


1.237 


0.007 


1 


14 


0.04 


















-god_ 


467 


1.169 


0.011 


1 


12 


0.04 


















_way_ 


466 


1.056 


0.006 


1 


07 


0.04 



















25 



Table S9: Correlation 7 and burstiness o t /(t) obtained for 
the diferrent binary sequences in the indicated book. 



Book: sawyer; N=369,222 



Original data 





N- 


rr /M 


error 


1 


error 


1 


error 


7 


error 


~\r(~WXT(* 1 Q 
VUWClO 


1 1 0026 

_L _L IJVJ 


432 


020 

VJ.VJ^jVJ 


1 
± 


21 
zo 


04 

VJ.VJ4: 


















71180 


0.402 


0.010 


1 
1 


^0 
ou 


0.05 


1 


r iO 


0.05 


1 
1 


^0 
ou 



u 


Uo 




35603 


0.864 


0.002 


1 

1 


in 


0.04 


1 

1 




0.05 


n 

u 






u 


D4 

U4: 


t 


28825 

j-J U CJ !j 


0.858 


0.002 


1 
1 


1 fi 


0.04 


1 

1 


21 
zo 


0.04 



u 


QQ 




u 


04 
U4 


cl 


23478 


0.858 


0.002 


1 
1 


1 7 


0.04 


1 

1 


ni 


0.03 


1 

1 


1 1 
10 



u 


04 
U4 


Q 


23192 


898 

VJ .Ofu 


001 

VJ.VJVJJ. 


1 
1 


22 


04 

VJ.VJ4: 


1 

1 


or 

uu 


01 

VJ.VJO 



u 


Q8 

yo 



u 


04 
U4 


n 


20146 


0.866 


0.002 


1 
1 


1 2 


0.04 


1 

1 


21 
zo 


0.04 


1 

1 


07 
u / 



u 


01 
uo 


h 


19565 


0.861 


0.002 


1 
1 


1 8 


0.04 


1 

1 


1 1 

1 1 


0.04 


1 

1 


D7 

U 1 




u 


D4 

U4: 


j 


18811 


0.910 


0.002 


1 
1 


1 fi 


0.05 


1 

1 


21 
zo 


0.04 


1 

± 


O r i 
uo 




u 


01 

UO 


g 


17716 


0.951 


0.001 


1 
1 


1 Q 


0.04 


1 

1 


1 1 

10 


0.04 


1 

1 


OX 
Uo 



u 


01 
uo 




1 5247 


917 

VJ . ZJ _L 1 


002 

VJ •\J\JZj 


1 
1 




05 

VJ.VJ<J 


1 

1 


1 1 
10 


04 

VJ.VJ4: 


1 

1 


1 
1U 



u 


0^ 
Uo 


LI 


1 4850 


950 

VJ .ZJO\J 


002 

VJ.VJVJZrf 


1 
1 


2D 


04 

VJ.VJ4: 


1 

1 


20 
zu 


04 

VJ.U4: 


1 

1 


01 

Ul 




u 


01 
uo 




12136 


1 086 


002 

VJ.VJVJZ/ 


1 
1 


1 7 


04 

U.VJ4: 


1 

1 


1 4 


04 

VJ •\J L ± 


1 

1 


OR 
uu 




u 


01 
uo 


u 


8942 


0.949 


0.002 


1 
1 


1 8 
10 


0.04 


1 

1 


07 

U f 


0.04 


1 

1 


OR 
uu 




u 


01 
uo 


w 


8042 


0.949 


0.002 


1 

J. 


1 1 


0.03 


1 

1 


1 8 
10 


0.04 


1 

1 


1 2 

1Z 




u 


04 

U4: 


HI 


7135 


0.977 


0.002 


1 
1 


22 
zz 


0.04 


1 

1 


1 8 

10 


0.04 


1 

1 


02 
uz 




u 


01 

UO 


v 
J 


6725 


1.043 


0.002 


1 
1 


Ifi 


0.04 


1 

1 


04 


0.03 


1 

1 


00 
uu 



u 


04 

U4 


c 

& 


6606 


1.041 


002 

VJ.VJVJZj 


1 
1 


1 fi 


04 

U.VJ4: 


1 
1 


1 ^ 
10 


05 

VJ.VJrJ 


1 

1 


or 
uu 



u 


01 
uo 




6497 


1.030 


0.003 


1 
1 


21 
zo 


0.05 


1 
1 


OQ 
uy 


0.05 


1 

1 


1 R 

1U 



u 


04 

U4 


f 


6004 


1.047 


0.003 


1 
1 


22 
zz 


0.04 


1 

1 


1 1 
1 1 


0.03 


1 

1 


(12 
uz 




u 


ni 

uo 


b 


4958 


0.959 


0.003 


1 
1 


1 n 

1U 


0.04 


1 

1 


2^ 
zo 


0.04 


1 

1 


02 
uz 




u 


01 
uo 


the 


3703 


1.154 


0.004 


1 
1 


1^1 


0.04 
















ft n r\ 


31 05 

O J-VJ<J 


1 008 


003 


1 
± 


21 

Z 1 


04 

U.VJ4: 
















a 


1863 


1.085 


0.005 


1 

J. 


on 
zu 


0.04 
















_to_ 


1727 


1.054 


0.004 


1 
1 


1 4 

1*1 


0.03 
















_of_ 


1436 


1.127 


0.005 


1 
1 


21 

Z 1 


0.04 
















_he_ 


1197 


1.770 


0.015 


1 
1 


4fl 

4:U 


0.04 
















torn 


689 


1.740 


0.014 


1 
± 


1Q 

oy 


0.06 
















witli 


647 


1.068 


0.008 


1 
1 


1 ^ 


0.04 
















if 


237 


1.404 


01 1 


1 
± 


21 
zo 


04 

VJ.VJ4: 
















liuck 


223 


3.228 


0.024 


1 
± 


4fi 
^±u 


0.07 
















boys 


155 


1.767 


0.019 


1 


24 


0.06 
















_did_ 


150 


1.336 


0.018 


1 


22 


0.04 
















_joe_ 


133 


2.248 


0.051 


1 


38 


0.06 
















_never_ 


131 


1.185 


0.017 


1 


14 


0.04 
















_boy_ 


122 


1.788 


0.054 


1 


29 


0.06 
















_back_ 


121 


0.968 


0.015 


1 


04 


0.03 
















_ofL 


99 


1.335 


0.019 


1 


12 


0.04 
















_night_ 


98 


2.025 


0.057 


1 


29 


0.04 
















_other_ 


96 


1.145 


0.019 


1 


12 


0.03 
















_becky_ 


96 


2.701 


0.036 


1 


55 


0.10 

















Shuffling Ml 



Shuffling M2 



Table S10: Correlation 7 and burstiness cr r /(r) obtained for 
the diferrent binary sequences in the indicated book. 









Book: ulysses; N= 


=1,453,586 
















O 


riginal data 




Shuffling Ml 


Shuffling M2 


ClPflllPTl CO 


AT- 


*t/(t) 


error 


1 


error 


7 


error 


7 


error 


T TI~\X\ Tpt 1 O 

VUWclo 


440B7B 


n 
u 


4^R 
400 


n 090 

u. uzu 


1 R1 

1 . U 1 


n 
u 


OR 

uo 




















9B^304 


n 
u 


43R 
40U 


00Q 


1 78 

1.10 


n 
u 


Ofi 
uu 


1 
1 


78 


n 
u 


OB 
Uu 


1 
1 


78 


n 
u 


0B 
uu 




141465 


n 


8^ 
000 


009 


1.28 


n 

u 


04 


1 

L 


^0 
ou 




u 


OB 
uu 


1 
± 


1 1 

X X 





03 
uo 


t 


1 001 83 


n 

u 


Q04 
yu4 


001 

u. UUl 


1 ^4 


n 
u 


07 
u / 


1 
1 


^7 
/ 


n 
u 


OB 
Uu 


1 
1 


0^, 

UO 


n 

u 


03 
Uo 


CL 


Q31 9Q 

i/O IZjC/ 


n 
u 


877 


001 


1.32 


n 
u 


O^i 
uo 


1 
1 


zo 


n 
u 


OB 
Uu 


1 
1 


OR 
UU 


n 

u 


03 
uo 





Q1 403 


n 
u 


you 


001 


1 1 Q 
1 . x y 


n 


04 


1 
1 


00 


n 
u 


0^ 
uo 


1 
1 


1 

1U 


n 

u 


04 

U4: 




SI 407 

O 14:U i 


n 
u 


Q1 4 


001 


1.44 


n 
u 


07 
u / 


1 
1 


00 


n 
u 


OB 

Uu 


1 

X 


91 


n 

u 


0^i 

UO 


n 


soi 38 


n 
u 


807 


001 


1 34 

1 . 01 


n 
u 


Ofi 
UO 


1 
1 


97 


n 
u 


04 

U4 


1 

1 


9fi 
zu 


n 

u 


0B 
uu 


s 


7BQ1 ^ 


n 
u 


q^o 

you 


001 


1 40 


n 
u 


OR 
uu 


1 
1 


^1 
01 


n 
u 


OB 
Uu 


1 

1 


9^ 
zo 


n 

u 


0B 
uu 


h 


79^0 

1 ZOOU 


u 


QOR 
yuu 


009 

u. uuz 


1 B1 

1 . u 1 


n 
u 


OR 
uu 


1 
1 


9^ 


n 
u 


0^ 

Uo 


1 

1 


44 


n 

u 


OR 
uo 


r 


Uc/OOZi 


n 
u 


Q1 8 
y 10 


001 


1 4Q 
1 ,iy 


n 
u 


OR 
uu 


1 
1 


99 


n 
u 


04 

U4 


1 

1 


^0 
ou 


n 

u 


0B 
uu 






1 

1 


074 
U t 4 


001 

U.UUl 


1.41 


n 
u 


OR 
uu 


1 
1 


oy 


n 
u 


OB 
Uu 


1 

1 


1 Q 

iy 


n 

u 


0^i 
uo 


CI 


4Q0Q3 


n 
u 


qso 
you 


001 

U.UUl 


1 44 


n 
u 


O 1 ^ 

uo 


1 
1 


1 7 


n 

u 


04 
U4 


1 

X 


04 

U4 


n 
u 


04 
u^ 


u 


OO979 


n 
u 


Q89 

yoz 


001 

U.UUl 


1 9^ 

_L . ZO 


n 
u 


04 

U4: 


1 
1 


^R 
ou 


n 
u 


OB 
Uu 


1 

X 


1 fi 

1U 


n 

u 


04 

U4: 


m 


31 ^3^ 
J. ooo 


1 

_L 


uzo 


001 

U. UUl 


1.29 


n 
u 


O 1 ^ 
UO 


1 

_L 


00 




u 


uo 


1 

X 


0^ 
uo 





03 
uo 


p 

L< 


9Q894 


1 
X 


079 
U f z 


001 

U. UUl 


1.62 


n 
u 


07 
u / 


1 
1 


^1 
01 


n 
u 


04 

U4 


1 

1 


ou 


n 

u 


07 
u / 


c 
6 


97791 


1 
1 


031 


001 

U. UUl 


1 3fi 
1 . ou 


n 
u 


OR 
uu 


1 
1 


94 

Z'i 


n 
u 


0^ 
uo 


1 

1 


1 Q 


n 

u 


04 

U^l 


f 


ZUUOO 


1 
X 


09^, 


001 

U. UUl 


1 30 
1 . ou 


n 
u 


O^i 
uo 


1 
1 


99 
zz 


n 
u 


04 

U4 


1 

1 


1 9 

± z 


n 

u 


03 
uo 




26164 


1 

X 


o^r 

uou 


001 

U. UUl 


1 ^3 

1 .00 


n 
u 


07 
u / 


1 
1 


^9 
oz 


n 
u 


0^ 
uo 


1 

1 


1 


n 

u 


0B 
uu 


y 


24251 


I 


039 

UOZ 


001 

U. UUl 


1 3fi 
1 . ou 


n 

U 


05 

UO 


I 


1 5 

-LO 




U 


03 
uo 




01 




U 


03 
uo 


P 


22440 


1 

_L 


1 94 


0.002 


1.46 


n 

U 


OB 

UU 


1 

_L 


97 



u 


uo 


1 


9^ 
zo 



u 


0^ 
uo 


the 


14952 


1 

_L 


071 

U 1 i 


0.002 


1.44 


n 

U 


OB 
uu 


















nf 

_U1_ 


8141 


1 
X 


1 91 


003 

u. uuo 


1 63 
1 . uo 


n 

u 


07 


















_CL11U._ 


721 7 


1 

_L 


1 B7 

1U ( 


003 

u. uuo 


1 53 

1 .00 


n 
u 


O 1 ^ 
uo 





















UO IO 


1 
1 


1 44 


003 

u. uuo 


1.24 


n 
u 


04 

U4: 


















in 


4QB3 


1 
1 


1 ^7 


003 
u. uuo 


1 38 
1.00 


n 
u 


07 


















111 


4Q4B 


1 

_L 


009 


009 
u. uuz 


1 1 6 

1 . 1 U 


n 

U 


04 

Ul 




















51 n 

O 1U 


1 
1 


4B1 


01 3 
u. u 10 


1 27 

1 . _ / 


n 

u 


04 


















qf prv V> 


5f)5 

OUO 


4: 




0Q9 

U. \JUZj 


1.64 


n 

u 


OB 


















_W6_ 




9 
Z 


497 

4:Z ( 


OR^ 


1 9"i 

X . ZO 


n 


04 

U'l 


















_maii_ 


41 ^ 

4:10 


1 

_L 


3&& 


01 Q 
u.uiy 


1 1 fi 


U 


04 

Ul 


















into 


330 


1 


179 


0.011 


1.14 





04 


















eves 


329 


1 


921 


0.013 


1.21 





04 


















_where_ 


310 


1 


214 


0.014 


1.11 





03 


















_hand_ 


308 


1 


295 


0.017 


1.18 





04 


















.street _ 


293 


1 


394 


0.013 


1.21 





04 


















_our_ 


291 


1 


556 


0.018 


1.23 





04 


















.first _ 


278 


1 


306 


0.011 


1.19 





04 


















_father_ 


277 


1 


631 


0.013 


1.62 





05 


















-day. 


250 


1 


131 


0.012 


1.10 





03 


















_just_ 


249 


2 


014 


0.012 


1.20 





04 



















Table Sll: Correlation 7 and burstiness <j t /(t} obtained for 
the diferrent binary sequences in the indicated book. 
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